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Abstract 

The von Neumann entropy and the subentropy of a mixed quantum state 
are upper and lower bounds, respectively, on the accessible information of 
any ensemble consistent with the given mixed state. Here we define and 
investigate a set of quantities intermediate between entropy and subentropy. 



PACS numbers: 03.67.-a, 89.70,+c, 65.40.Gr 



1 Entropy and subentropy 



The von Neumann entropy of a quantum state p can be denned as 

n 

S(p) = -trplnp = -^AjlnAj, (1) 

3=1 

where n is the dimension of p, the A's are its eigenvalues, and the expression 
xlnx, when evaluated at x — 0, is taken to have the value ]xm x ^ox\nx = 0. 
The von Neumann entropy is of central importance in physics; when applied 
to a thermal ensemble, it is the entropy of thermodynamics. In quantum 
information theory it plays prominent roles in many contexts, e.g., in studies 
of the classical capacity of a quantum channel |], |[ and the compressibil- 
ity of a quantum source f4|. To introduce the problem that we will be 
considering here, we focus on the role that the von Neumann entropy plays 
in Holevo's theorem |5|, || [7], [J. Part of the content of this theorem can be 
stated as follows. Suppose we are handed a quantum object and are told that 
it is in one of several possible pure states i = 1, . . . , N, the probability 
of the state \ipi) being pj. By measuring this single object, we aim to get 
as much information as possible about the identity of the state, that is, the 
value of the index i. The maximum amount we can obtain is called the ac- 
cessible information of the ensemble consisting of the ordered pairs {\if)i),Pi). 
In general there is no analytic formula for the accessible information, but 
Holevo's theorem gives us a simple and general upper bound: the accessible 
information is no greater than the von Neumann entropy of the ensemble's 
density matrix 

N 

P = T.Pi\^)^i\- ( 2 ) 
1=1 

Moreover, the von Neumann entropy — we will usually refer to it simply as the 
entropy — is the least upper bound on the accessible information that depends 
only on the density matrix p and not on other details of the ensemble. To 
see why this is true, note that the ensemble consisting of the eigenstates of p, 
with the eigenvalues as weights, is an ensemble realizing the density matrix 
p and from which one can extract, in a single measurement, an amount of 
information equal to S(p). That is, the upper bound can be achieved. 

It is natural also to ask about the analogous lower bound: what is the 
greatest lower bound on the accessible information of an ensemble that de- 



2 



pends only on the ensemble's density matrix? This question has been an- 
swered ||: the greatest lower bound is the subentropy Q(p), defined by 



Q(p) = -Efrir^v)^ ln ^ ( 3 ) 

(If two or more of the eigenvalues Xj are equal, the value of Q is determined 
unambiguously by taking a limit starting with unequal eigenvalues.) Just as 
the ensemble of eigenstates of p has an accessible information that matches 
the upper bound S(p), there is a complementary ensemble, called the Scrooge 
ensemble f9j, that likewise realizes p but has an accessible information equal 
to the lower bound Q(p). 

Thus in this context of acquiring information from a single quantum sys- 
tem, the von Neumann entropy and its lesser known analog the subentropy 
play mirror-image roles and together define the range of possible values of 
the accessible information for a given density matrix. 

Comparing Eqs. ([!]) and (§) one sees a certain formal similarity between 
S and Q. The similarity is more striking if we rewrite both S and Q as 
contour integrals 0. One can write 

S( P ) = ~ /(hi*) tr (I - p/zy'dz, (4) 

where the contour encloses all the nonzero eigenvalues of p. To make the con- 
nection between Eq. (f|) and Eq. ([!]) note that the eigenvalues of (/ — p/z)^ 1 
are z/(z — Xj), so that each term in the trace contributes a residue that 
becomes a term in Eq. (fl]). Similarly, one can express Q as 

Q(p) = - ^ /(In z) det (/ - p/z^dz. (5) 

Thus, where the trace appears in the formula for entropy, the determinant 
appears in the formula for subentropy. 

The formulas given in Eqs. (^) and (S) raise an interesting mathematical 
issue which is the impetus for this paper. The trace and the determinant of 
a matrix are simply the first and last of the coefficients in the characteristic 
polynomial of the matrix. In place of the trace in Eq. @ or the determinant 
in Eq. (g), one could insert any of the other coefficients of this polynomial 
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and thereby identify new functions that might be regarded as natural gen- 
eralizations of entropy and subentropy. In what follows we define a set of 
functions of p based on this mathematical substitution and investigate their 
properties. We call the functions R^ n \ r = 1, . . . ,n, with being equal 
to S and being equal to Q. Among the properties we will discover is 
the string of inequalities Q = R^ < R^-x < • • • < R\ = S, valid for any 
density matrix p. 

In some respects, the subentropy Q is quite unlike the entropy S. For 
example, Q is not additive: if p = p\ ® P2, then Q(p) is typically not the 
same as Q(pi) + Q{p2), whereas the entropy is always additive in this sense. 
However, Q does share with S the following property. Suppose we augment 
the state space and the density matrix p by including m extra dimensions 
with zero weight. That is, we replace p with p©0 m , where m is the mxm zero 
matrix, in effect adding to the set of eigenvalues (Ai, . . . , A n ) m additional 
eigenvalues all equal to zero. One can see immediately from Eqs. (^]) and (0) 
that both S and Q remain invariant under this augmentation of the space. 
Since we are looking for natural generalizations of S and Q, it is interesting 
to ask whether our new quantities R^ 1 ' also have this property. 

We will find, in fact, that they do not. But we will be able to construct 
simple convex combinations of the R^ n ^s that do remain invariant under the 
addition of "null" dimensions. These particular linear combinations, called 
lZ a , are parameterized by the single continuous parameter a and interpolate 
between S and Q. 

We are thus investigating in this paper various functions that general- 
ize von Neumann entropy and subentropy in a specific mathematical sense. 
There is no guarantee, of course, that these functions will be of value for 
physics. At the end of the paper we offer a speculative potential interpreta- 
tion of lZ a in quantum information theory but otherwise leave this question 
for future investigation. 

2 Definition of 

Given any nxn complex matrix M, the characteristic polynomial of M is the 
quantity det(//J — M) regarded as a function of p. If we write this polynomial 
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as 



det(/if — M) = p n + ^(-l) r C r (M)/i n " 



(6) 



r=l 



then the coefficient C r (M) is given by 



C r (M) 



e (no. 



ki<---<k r s=l 



(7) 



the z/'s being the eigenvalues of M. Thus the index r indicates the number of 
eigenvalues being multiplied together in each term.[| The coefficient C\(M) 
is the trace of M, and C n (M) is the determinant. 

By analogy with Eqs. (U) and @, we now define a set of quantities 
as follows: 



itf)(p) = -(^) ^-J{\nz)C r [{I- P /z)- l \d, 



(8) 



where again the contour is chosen to enclose all the nonzero eigenvalues of 



p, and 



n-l 

r-1 



is the binomial coefficient 



(n-l)' 



We have included this 



(r-l)l(n-r)! ' 

factor because, as we will see in the following sections, it places the functions 
between S and Q. Note that, as promised, R^ n '(p) is equal to S(p) for 
r = 1 and to Q(p) for r = n. 

It is straightforward to evaluate the integral in Eq. (g) so as to write R^P 1 ' 
explicitly as a function of the eigenvalues of p. One finds that 



Ri n) 



n-l 
r-1 



E E 

fcl< - <fcr S=l 



n 

t^s 



A 



Az,„ In At- . 



(9) 



For r = 2, . . . , n, we can rearrange the indices to get an expression more 
analogous to Eq. (|): 



Ri n) 



E 



n-l 
r-1 



E 



fci<--<fc,._i 

each k s ^j 



r-1 

n 

s=l 



A, 



Aj - A fcs 



'A^lnAj. (10) 



: We adopt the convention that there are always exactly n eigenvalues of an n x n 
matrix: if a root fx = v of the equation det(/^J — M ) = has multiplicity m, we say that 
m of the n eigenvalues of M have the value v. 
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Notice that the number of terms in the sum over ki,...,k r -i is 




because there are n — 1 index- values from which to choose, the value j being 
disallowed. Thus the quantity in curly brackets is an average of the kind of 
product that appears in the expression (^|) for Q. 

As in the case of Q, in order to evaluate Eq. ( [T0|) when two or more of the 
eigenvalues Xj are equal, we have to take a limit. That the limit is unique is 
guaranteed by Eq. (§) which has a unique value for all density matrices p. 

Though we have already written the functions R)fi in a few ways, it 
will be helpful to re-express these functions in quite different terms in order 
to derive certain properties. This re-expression is the goal of the following 
section. 



3 Another path to R[ 



n ) 

r 



Let us return to the problem of ascertaining the quantum state of a single 
quantum system, given the ensemble , Pi)} ■ In addition to being a lower 
bound on the amount of information one can gain when one makes the best 
possible measurement, the subentropy Q(p) is also the average information 
one obtains about the state, where the average is over all complete orthogonal 
measurements. (Indeed, the latter fact is sufficient to prove that Q is a lower 
bound on the accessible information.) Interpreting Q as this average leads 
to another way of expressing Q mathematically ||. 

Q(p) = —n J ^ ^2 ^i x i^J l n y X! \ x i^dx + n J X\ \i\X\dx. (11) 

Here the Xj's are non-negative real numbers constrained to sum to unity; 
that is, the ordered set x — (x\, . . . , x n ) represents a point in the probability 
space, or probability simplex, appropriate for a set of n possibilities. The 
integrals in Eq. fllTD are integrals over this probability space, the measure 
being the uniform measure normalized to unity. Explicitly, for any function 

r If 1 fl- x i rX—x\ x n -2 

g(x)dx = — / / ■■■/ g(x)dx n ^i ■ ■ ■ dx 2 dxi. (12) 



(n — 1)! 7o Jo 



In Eq. (|TT|) there is no special significance to the index 1 that appears in the 
second integral. Because of the symmetry of the measure, any other of the 
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Xi's could equally well have been chosen. In fact, we can write the integral 
more symmetrically as follows. 

Q(p) =nj f(x)dx, (13) 

f(x) = - ( Yl 111 ( ^i X ij + 111 X i- ( 14 ) 



where 



Interestingly, the entropy S(p) = R{ n \p) can be written in an analogous 
form. We simply need to replace the integral J(---)dx in Eq. (|l"3"D with 
a discrete sum over the extreme points of the probability simplex. That is, 
instead of integrating over all points x — (xx, . . . , x n ), we sum over the special 
points acW = (1,0,..., 0), x^ = (0, 1, 0, . . . , 0), s< n > = (0,...,0,1). 
Again, we take the total weight of all these points to be unity. Thus, starting 
with Eq. ( |T3"D we perform the modification 



f(x)dx^-^f{x^), (15) 
n i=i 



which brings us to 



/ 1 \ n n 
n{- E/(^ (i) ) = -£ VnA,- = S(p). (16) 
w j=i j=i 

It turns out that the quantities for other values of r can likewise be 
expressed as in Eq. ([13]) but with different ranges of integration. We have 
just seen that R{ n \ which is the entropy itself, can be expressed in this way 
if the "integral" is taken to be over the discrete set of extreme points of the 
simplex. As we will show shortly, R^ is similarly given by Eq. (0), but 
with the integral being taken over the edges of the simplex, that is, over 
those points x having at most two nonzero components. (Again the measure 
is uniform in the Euclidean sense and normalized to unity.) And in general, 
R^ is given by the same expression, but with the integral being over all 
points x having at most r nonzero components. 

To prove this claim, let us set up the integral 7^ that we have just 
described: 

I^=( n Yn J2 I , f( x )dx- (17) 



ki<--<k r 



k\,...,k 
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Here j k k (• • -)dx is the integral over the "face" of the simplex in which only 
Xkx, ■ ■ ■ ,%k r are nonzero, with the measure normalized to unity. There are 



terms in the sum, so we have divided by 



to ensure that the measure 



of the entire region over which we are integrating — that is, the collection of all 
the relevant faces — is normalized to unity. We wish to show that I^ 1 ' = . 
Consider first the integral over just one face, 



f(x)dx. 



(18) 



We can regard this integral as being over a complete probability space, but 
with only r possibilities instead of n. Therefore, if we multiply it by r, we 
see from Eq. (|T^) that we get something formally similar to Q — not the Q of 
the original density matrix p but rather of an effective r-dimensional density 
matrix whose (unnormalized) eigenvalues are A^, . . . , Xk r - (The equivalence 
between Eq. C p~3|) and Eq. (|) does not depend on the A's adding up to unity 
[§].) That is, from Eq. (|3|) we have 



f(x)dx 



E 



s=l 



n 



A, 



- A fcs - X kt 

Inserting this expression into Eq. fll7|) , we get 



Afe B In A A 



(19) 



j(n) 



n-1 
r-1 



-1 



E E 

A.'l< - <fc r 8=1 



Aa 



n v \ \ 



Afc„ In A* 



(20) 



which according to Eq. (W) is equal to R^ n \ We have, therefore, 



Ri n) = li n) 



f(x)dx, 



(21) 



as claimed. 

We can thus write all the quantities R^ 1 ' as normalized integrals of the 
same integrand, but with different ranges of integration. 
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4 Ordering the i?'s 



In this section we use the form just derived to prove the string of inequalities 
mentioned in the introduction: 

Q{p) = R { :\p) < R^i(p) <■< R { "Hp) = s{p), (22) 

which hold for every n x n density matrix p. We will show, in fact, that all 
the inequalities are strict except when p is pure, in which case R^ = for 
every r. Since each function depends only on the eigenvalues Ai, . . . , A„, 
which are non-negative and sum to unity, we can alternatively think of R^ 1 ' 
as a function on the probability space for a set of n possibilities. If we picture 
each of these functions as a "surface" plotted over the probability space, our 
inequalities tell us that the surfaces corresponding to different values of r do 
not cross each other and coincide only at the extreme points of the simplex. 

To prove the (non-strict) inequalities (0), we first prove that the function 
/ defined in Eq. (|14|) is a convex function of x for every set of allowed values of 



the A's. We do this by extending the definition (14) to all non-negative values 
of the Xj's — that is, we allow x to be unnormalized — and showing that / is 
convex even in this larger set. Treating the independent variables — 

and for the moment restricting our attention to the case where they are all 
strictly positive — let us compute the matrix of second derivatives of /: 

d 2 f A 

We show that the matrix M is non-negative definite by considering its expec- 
tation value with respect to an arbitrary real vector v. Using Dirac notation, 
we have 

(v\M\v) = (E^)(E^) - (E^) 2 . (24) 



But if we define new vectors w and z by Wi = Vi^jXi/xi and Zi = y/X~iXi, then 
we can write this equation as 

(v\M\v) = (w\w)(z\z) - (w\z) 2 , (25) 

whose right-hand side is non-negative by the Schwartz inequality. Because 
M is related to d 2 f /dxidxj by a positive factor, it follows that / is a convex 
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function of x, at least when each Xi is greater than zero. But by continuity, 
the convexity extends to those points where some of the components Xi are 
zero. 

We will also need strict convexity in certain cases, and for this we need to 
take into account the possibility that some of the A's might be zero. Suppose 
that Afc i; . . . , Xk a are nonzero and that all the other A's are zero. Notice that 
in that case the right-hand side of Eq. (^5|) is zero only when the compo- 
nents (vin, ■ ■ ■ ,Vk s ) of v are proportional to the corresponding components 
. . . , Xk s ) of x. But v defines the direction along which we are taking the 
second derivative of /. Therefore if we consider a line containing two values 
of (xk-L, • • • , Xk a ) that are not proportional to each other, the second derivative 
of / along this line is strictly positive, so that / is strictly convex along this 
line. (The second derivative might approach infinity as some components x« 
approach zero, but this pathology does not ruin the convexity.) We will need 
this fact shortly. 

We now use the convexity of / to prove the inequalities (|22|) , beginning 
with the first one: 

R { n ] < Rn-i- Consider any point x — (x\, . . . , x n ) in the 
probability simplex that is not one of the extreme points. We can write x as 

(xx,...,x n ) = ^rr{(l -xi)[(0,x 2 ,...,x„)/(l - xi)] 

+ (1 - x 2 ) {( Xl , 0, x 3 , . . . , x n )/(l - x 2 )\ + ■■■ (26) 
+ (1 - x n )[(xi, . . . 0)/(l -*»)]}. 

Notice that the vectors in square brackets are all properly normalized, and 
that the coefficients multiplying them, that is, (1 — x\)/(n — 1) , . . . , 
(1 — x n )/(n — 1), add up to one. We have thus written the vector x as 
an average of other legitimate probability vectors. From the convexity of /, 
it follows then that 

/(*) < ^{(1 - an)/[(0, x 2 , . . . , x n )/(l - xi)] 
+ (1 - x 2 )f[(xi, 0, x 3 , . . . , x n )/(l - x 2 )\ + ■■■ (27) 
+ (1 - x n )f[(xi, . . . ,x n __i,0)/(l - x n )]j. 

Moreover, if any two of the Aj's are nonzero, and if the corresponding com- 
ponents Xi are also nonzero (we are about to integrate over all x, so that 
this latter condition is almost always met), then for at least one pair of the 
normalized vectors appearing in Eq. (|26|), the line connecting them is a line 
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along which / is strictly convex. Thus in this case the inequality in Eq. (E7]) 
is strict. 

We now integrate both sides of the inequality Q27D over the whole prob- 
ability simplex, again using our normalized measure. To see what this in- 
tegration does to the right-hand side, let us consider for now just the first 
term, 

1 r (l - Xl )f[(0, x 2 , . . . , x n )/{\ - x 1 )]dx. (28) 



n — 1 

We perform the integral by first integrating over each surface that has a fixed 
value of xi, and then integrating over x\. The expression in Eq. (ESI) becomes 



^) "^:;C" i j^- < 29 > 

Here the factor of (1 — Xi) n ~ 2 comes from the fact that the area of the surface 
defined by a fixed value of x\ is proportional to (1 — Xi) n ~ 2 . The denominator 
provides the proper normalization. Evaluating the integrals over x\ brings 
the expression in Eq. (|29| ) to 

(1/n) / f(x)dx. (30) 

J2,...,n 

We can treat the other terms on the right-hand side of Eq. (p7| ) in the same 
way, so that upon integration, this inequality becomes 

/ f(x)dx < (1/n) [ f( x ) dx - (31) 

Multiplying both sides by n and using Eq. (|2"T|), we have 

< R { X (32) 

with equality holding only if just one of the A's is nonzero, that is, if p is 
pure. 

The other inequalities in Eq. (p2[) can be obtained by a similar argument. 
Consider any face of the probability simplex in which only r of the com- 
ponents Xi are non-zero. Each point x on such a face can be decomposed 
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as in Eq. fl26|), and the above argument gives us an inequality analogous to 
Eq. ©: 

f(x)dx < - I f(x)dx+ 1 f(x)dx+- ■ ■+ 1 f(x)dx 

k\,...,k r T \_Jk2,...,k r J ki,k3,...,k r Jki t ...,k r —i 

(33) 

We now insert this inequality into the expression (P5) for R^: 



R^ = ( n Y 1 n £ / fix)** 

\r/ kK-<k r Jkl >-' kr 



<C)V - ( 1 E /, , f (34) 

fci<---<fc r _i 



l,...,fe r -l 



Here the factor of n — (r — 1) comes from the following fact: given any set A 
of r — 1 distinct index-values [which defines the range of one of the integrals 
on the right-hand side of Eq. fl34D], there are n — (r — 1) sets of r distinct 
index-values from which A could have been obtained by the deletion of one 
value, so that each integral associated with the set A appears n — (r — 1) 
times. Simplifying the factors in Eq. (0), we get 

Ri n) <( n Y 1 n V / f(x)dx = R ( ?\- (35) 

V'-- 1 / u.^^u Vfcl,...,fcr-1 



fel<--<fc r _l 



Moreover, by an argument similar to what we used before, equality holds 
only if p is pure. This completes our proof of the string of inequalities (p2|). 



5 Other properties of 

In this section we demonstrate various other properties of R^ r n \ In particular: 
(i) we show that as a function of A = (Ai, . . . , A n ), R^ 1 ' is concave; (ii) we 
find the maximum value of R^ n '; (iii) we determine how R^ is affected by 
the addition of extra dimensions with zero eigenvalues. 

(i) R^ is concave. We showed earlier that the quantity / of Eq. (|T^) , 
regarded as a function of x, is convex. It is easier to see that as a function 
of A = (Ai,...,A„) (with J2i Aj = 1), / is concave: the function —ylny is 
concave in y, and apart from a linear term, our function / is of this form, 
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with y being a linear function of the A's. According to Eq. fl2T|), R^ 1 ' is a 
sum of these concave functions and is therefore concave itself. 

(ii) Maximum value of R^ n \ Because R^ 1 ' is concave and is symmetric under 
interchange of the Aj's, it must achieve its maximum value when all the Aj's 
are equal, in which case they are all equal to 1/n. It is probably easiest to 
obtain this maximum value explicitly via Eq. fl2ip. Upon doing the integral, 
one finds that for r = 2, . . . , n, 

maximum of R^ = Inn — ( — I h • • • H — \ (36) 

V 2 3 t J 



(iii) Adding null dimensions. For many purposes, a density matrix in n di- 
mensions can be regarded equally well as a density matrix in m dimensions 
with m > n, but with m — n additional eigenvalues that are all zero. As 
we mentioned in the introduction, the entropy S(p) does not change if one 
adds dimensions in this way (just as the Shannon entropy does not change if 
one imagines additional possibilities all having zero probability), and neither 
does the subentropy Q. It is interesting that in the case of Q this invari- 
ance follows immediately from the form of Eq. (|5|): supplementing p with 
extra zero eigenvalues means supplementing the matrix (/ — p/z)^ 1 with 
extra eigenvalues all equal to 1, and these eigenvalues do not change the 
determinant. 

As we have said, however, our intermediate quantities R^ 1 ' for r = 2, . . . , 
n — 1 do not behave so simply upon addition of null dimensions. From Eq. (^) 
one can show that adding m zero eigenvalues to what was originally annxn 
density matrix has the following effect on R r : 

It is worth checking that this equation is consistent with our assertion that 
both S and Q are invariant under the addition of zero eigenvalues. The 
entropy in n + m dimensions is S^ n+m ^ = R^ +m \ Setting r = 1 in the 
above equation gives us just one term, the one with s = 0, and we see that 
R {n+ m) = E (n)_ g imilarly for the su bentropy, Q(" +m ) = R^Sfi: if we set 

r = n + m in the above equation, we find again that only one term survives, 
the one with s = m, and that R^+m ^ = Rffl ■ 
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6 Combinations invariant under the addition 
of zero eigenvalues 

Invariance under the addition of null dimensions is a rather essential property 
of the von Neumann entropy. So if we are looking for generalizations of 
entropy, we might reasonably insist on this invariance. We have just seen 
that R[ n) with r = 2, . . . , n — 1 does not have this property, at least not in 
any obvious sense, but it is interesting to ask whether we can use the -R^' s 
to construct functions that are invariant in this way. In particular, for each 
value of n let us look at weighted averages of the R^'s. That is, we ask 
whether one can find functions TZ^ n \\i, . . . , A ra ) of the form 

n 

n {n) = Y, b^R^ (38) 

r=l 

with bl n) > and £ r = 1, such that 

^" +1 )(A 1 ,...,A n ,0)=^(A 1 ,...,A n ). (39) 

We will refer to such sets of functions as "augmentation- invariant," or for 
brevity, simply "invariant." 

Combining Eqs. (|38|) and (|39"D , we see that the condition we want to 
satisfy is 

n+l n 

J2 ^ +1) ^ n+1) (Ai, . . . , An, 0) = £ fc^itf )(A 1; . . . , A n ). (40) 

r=l r=l 

But according to Eq. (|37| ) with m — 1, 

Rl n+1 \\i, . . . , A n , 0) = Rl n \\i, . . . , A n ) H i?^"\(Ai, . . . , A n ). 

n n 

(41) 

Inserting this last relation into Eq. (|40| ) and equating coefficients of R^ n \ we 
get the following condition on the 

(n - r + l)bi n+1) + r&£S 1} = nb^. (42) 

If TZ^ is to be augmentation-invariant, then Eq. (f42D must be satisfied for 
all pairs (n, r) such that n > 1 and 1 < r < n. Let us say that a set b 
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of non-negative values is a solution to the invariance problem if it is 
normalized — that is, if Ylr &r = 1 f° r each n — and if it satisfies Eq. (|42|) . 
We aim to find all such solutions. Note that the normalization condition 
J2 r = 1 is actually guaranteed by Eq. (|^) for all values of n if it is true 
for any one value of n: summing Eq. ([42]) over r gives us Y, r b^ 1 ^ = Y, r ■ 
Notice also that the set of solutions b is convex: if b and b' are solutions, then 
pb + (1 — p)b' with < p < 1 is also a solution. 

We begin by solving a slightly different problem, in which we restrict the 
range of n in Eq. ([42]) to 1 < n < N for some integer N. For this restricted 
problem, we note three facts: (i) The solution is completely determined by 
the values of b[ N \ r = 1, . . . , N; moreover every set of such values yields a so- 
lution, (ii) Because the set of allowed values of the ordered set (b[ N \ . . . , bffl) 
is compact, the set of solutions to the restricted problem is also compact, (iii) 
The extreme points of the convex set of solutions are generated by choosing 
b( N ' = 5 r f, with f in the range I < r < N and 5 being the Kronecker delta; 
that is, at the level n = N, we put all the weight on one value of r. Any 
other normalized set of can be obtained as a weighted overage of these 
special cases. 

Remarkably, we can write down explicitly the solution to Eq. (|2|) gener- 
ated by b( N > = S r f. 



6(") = ( n -M( J !-)( Ar -M~ 1 . (43) 



r— 1 / \ f—r I \ r—1 



One can verify that these b^'s satisfy Eq. (f4*2"D for n < N, that they are 
normalized, and that they take the values S r f for n = N. This solution has a 
simple interpretation in basic probability theory: in a series of iV — 1 tosses of 
a coin, b^ 1 ' given by Eq. (f£|) is the probability of getting exactly r — 1 heads 
in the first n — 1 tosses, given that in the full set of iV — 1 tosses, the number 
of heads is exactly r — 1. Again, any other solution of the restricted problem 
can be obtained by taking weighted averages of the solutions presented in 
Eq.©. 

We now return to the original problem, with no restriction on the value 
of n. As in the case of the restricted problem, there will be a set of extreme 
solutions from which all other solutions can be obtained as convex combi- 
nations. We find these extreme solutions by taking the limit of Eq. ( f43"D as 
iV — > oo and f — > oo while the ratio f/N approaches some value a in the 
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range < a < 1. This limit gives us the following basic solutions to the 
invariance problem: 



=( n J i )a r - 1 (l-a) n - r . (44) 



Again, one can verify directly that these satisfy Eq. (f4"2|). As in the 

restricted problem, this solution has a simple interpretation in terms of coin 
tossing: as given by Eq. (44) is the probability of getting r — 1 heads in 



n — 1 tosses if the probability of heads is a. Returning now to Eq. fl38|) we 
can identify, for each value of a, the following invariant set of functions 7Z^: 



n { :\ P ) = £ n " y-\i - ar~ r 4 n \ P ). (45) 

That is, by taking an average over r of the functions R( n \ with the weights 
in the average given by a binomial distribution, one obtains a function that 
is invariant under the addition of null dimensions. Moreover, these binomial 
averages are the extreme cases. One can always generate other invariant 
functions by taking convex combinations, but the binomial averages can be 
regarded as the basic solutions. To put it in other words, one can find 
invariant functions by weighting the i?£ n )'s with broader distributions, but 
not with narrower distributions. 

As a increases from to 1, the peak of the binomial distribution in 
Eq. fl4"5]) moves toward larger values of r. Since we have already shown that 
R). n > decreases (or remains unchanged) as r increases, we see immediately 
that TZ£' is likewise non-increasing with increasing a. For the extreme values 
a = and a = 1, we have 7Z^ = S and TZ^ = Q. Thus TZ^ interpolates 
continuously between S and Q. 

Just as S and Q can be written as contour integrals, it turns out that 
7Z^ can be written in a similar way: one can show that 

^:\ P ) = /(lnz) det {[/ - (1 - a)p/z\[I - p/z}- l }dz, (46) 

where the value at a = is determined by taking the limit. In this form, it is 
quite easy to see that IZ^ is invariant under the addition of null dimensions. 
The eigenvalues of the matrix whose determinant we are taking in Eq. (^6]) 
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can be written as 

eigenvalues = (1 — a) + a>( — ), (47) 

where as always, the Aj's are the eigenvalues of p. If any of the Aj's are zero, 
they contribute a factor of 1 to the determinant and can thus be ignored in 
calculating the value of 1Z^\ [The form fl47]) is also particularly convenient 
for deriving Eq. (f46|).1 Because of the augmentation-invariance, we can drop 
the superscript n and refer unambiguously to lZ a . We could also use the con- 
tour integral (fiE|), which contains no explicit reference to n, as an alternative 
definition. 



7 Discussion 

We have identified and studied various functions that lie between the entropy 
S and the subentropy Q. Our first set of such functions emerged as a 
natural mathematical generalization of Eqs. (Q) and (|5[), and also turned out 
to be generalizations of the alternative expression fllTf ) for Q as an integral 
over the probability simplex. These functions share certain properties with 
entropy — they are concave, they take the value zero when all but one of the 
eigenvalues of p are zero, and they take their maximum value when all the 
eigenvalues are equal — but unlike entropy they do not remain unchanged 
when one includes additional dimensions corresponding to zero eigenvalues 
of p. 

The related functions 1Z a are weighted averages of the .R^'s and there- 
fore share the properties just listed, but in addition they are invariant under 
the inclusion of null dimensions. Moreover they are the most basic func- 
tions having this property: other augmentation-invariant functions can be 
obtained as convex combinations of the TZ^s. 

One consequence of this invariance is a very modest kind of additivity. 
Let pi be an arbitrary density matrix of some quantum system and let p2 be 
the density matrix of a pure state of another system. Then for any a in the 
range < a < 1, we can say 

n a { Pl ®p 2 ) = n a ( Pl ) + n a ( P2 ). (48) 

This statement follows from the augmentation-invariance of lZ a along with 
two simple facts: (i) pi and pi <S> p 2 have the same nonzero eigenvalues, and 
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(ii) TZ a {p2) = 0. On the other hand, for arbitrary p\ and P2, 7Z a is not 
additive except when a = 0, in which case TZ Q is the entropy itself. 

Does either or lZ a have a physical meaning? At this point we have 
no definite interpretation of either of these quantities, though because of its 
nice mathematical properties we have more hope for 1Z a . Here we suggest 
one way in which this quantity might play a role in quantum information 
theory. 

Consider once again an ensemble £ = {{\ipi),pi)} of pure states of a quan- 
tum particle, and suppose that one is trying to convey classical information 
by sending a sequence of states chosen from this ensemble, with frequencies of 
occurrence asymptotically equal to the given probabilities pi. If the receiver 
(Bob) is required to measure each particle individually, then the maximum 
amount of information that the sender (Alice) can convey per particle is the 
accessible information of the ensemble £. Suppose, though, that Bob is able 
to measure pairs of particles jointly. Then Alice can hope to convey more 
information per particle by encoding her message in codewords consisting of 
pairs of the original states; that is, each codeword is of the form eg) \ip i2 ) 
with and \tpi 2 ) chosen from £. We insist that Alice respect the original 
probabilities of £ in the sense that in a long message, each state \ipi) is used 
with a frequency approximating p^. One finds that Alice often can increase 



the information conveyed per particle by using this strategy |10|, |TT, E| [13 
Moreover, by continuing to increase the length of the codewords, assuming 
that Bob can make arbitrary joint measurements on a whole codeword, Alice 
can convey even more information. Let I m be the amount of information one 
can convey per particle when the codeword length is m. The limiting value 
of I m for arbitrarily long codewords is simply S(p), where p is the density 



matrix of the ensemble £ 11 



In the first stage of the above scenario, when Bob can measure only 
individual particles, we know that Q(p) is a lower bound on the information 
that can be conveyed per particle. As the codeword length increases to 
infinity, I m increases to S(p). One is led to speculate that for intermediate 
codeword lengths, 1Z a (p) may play a role. For example, it is conceivable that 
when Alice and Bob are using codewords of length m, lZ a {p) is a lower bound 
on J m , where a = e -c(m_1 ) for some universal constant c. As m approaches 
infinity, then, the lower bound would approach TZo(p) = S(p), as it should. 

We can extend this idea to the study of the classical capacity of a quan- 
tum channel. At present one does not have a simple way of calculating this 
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capacity for all channels, only because it is not known whether the amount 
of information conveyed can be increased by using inputs that are entangled 
between different uses of the channel [Q. If we disallow entangled inputs, 



then the resulting capacity — called the Holevo capacity — is given by a sim- 
ple expression [lj, @ : h is the maximum, over all input ensembles, of the 
quantity S(p) — J2iPiS(pi). Here {(pi,Pi)} is the output ensemble, and p is 
its average density matrix J^iPiPi- As in the preceding paragraph, achieving 
this capacity requires that Bob be able to make joint measurements on arbi- 
trarily long blocks. But suppose that Bob cannot make such measurements; 
suppose that he can measure only blocks of size m. For the case m = 1, it is 
known that the information I\ that he can gain per particle is bounded below 
by max[Q(p) — J2iPiQ(Pi)}, the maximum being over all input ensembles 
Just as in the preceding paragraph, we can speculate that for arbitrary m, 
the information I m that one can convey per use of the channel is bounded 
below by max{lZ a (p) — Y^iPi^a{Pi)\i with a given by a = e ~ c ^ m ~ 1 \ Of course 
this statement is quite speculative and we would not even want to claim it 
as a conjecture. We present it only to suggest how the quantity lZ a might 
conceivably be applied. 

What we do have at present are a set of functions that share some math- 
ematical properties with entropy and subentropy. There is a certain ele- 
gance in the mathematics, but whether this elegance translates into value for 
physics remains to be seen. 
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